Model Selection

Multilingual Visual Question Answering

# Multilingual Visual Question Answering

Eurovlm 9B Preview

EuroVLM-9B-Preview is a multimodal vision-language model based on the long-context version of EuroLLM-9B, supporting multiple languages and visual tasks. It is currently in the preview version.

Transformers Supports Multiple Languages

Erax VL 7B V1.5

EraX-VL-7B-V1.5 is a powerful multimodal model specializing in Optical Character Recognition (OCR) and Visual Question Answering (VQA), excelling in multilingual environments with particular expertise in Vietnamese.

Transformers Supports Multiple Languages

Trillion LLaVA 7B

Trillion-LLaVA-7B is a vision-language model (VLM) capable of understanding images, developed based on the Trillion-7B-preview foundation model.

Transformers Supports Multiple Languages

Internvl3 8B 6bit

InternVL3-8B-6bit is a vision-language model converted to MLX format, supporting multilingual image-text-to-text tasks.

Transformers Other

Colqwen2.5 3b Multilingual V1.0

A multilingual visual retrieval model based on Qwen2.5-VL-3B-Instruct and ColBERT strategy, supporting dynamic input image resolution and multilingual document retrieval.

Text-to-Image Supports Multiple Languages

Llama 3.2 11b Vision R1 Distill

Llama 3.2-Vision is a multimodal large language model developed by Meta, supporting image and text inputs, optimized for visual recognition, image reasoning, and description tasks.

Transformers Supports Multiple Languages

Centurio is an open-source multilingual large vision-language model supporting 100 languages, capable of processing image-to-text and text-to-text tasks.

Transformers Supports Multiple Languages

Paligemma2 10b Mix 448

PaliGemma 2 is a vision-language model based on Gemma 2, supporting image and text inputs to generate text outputs, suitable for various vision-language tasks.

Mblip Bloomz 7b

mBLIP is a multilingual vision-language model based on the BLIP-2 architecture, supporting image caption generation and visual question answering tasks in 96 languages.

Transformers Supports Multiple Languages

mBLIP is a multilingual vision-language model based on BLIP-2 architecture, supporting image caption generation and visual question answering tasks in 96 languages.

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase